
   #next previous up next
   
   next up previous contents 
   Next: C. Support for Gcc Up: Aspell .28.3 alpha A Previous: A.
   Changelog   Contents
   Subsections
     * B.1 Things that will be done real soon
     * B.2 Things that need to be done
     * B.3 Things that I would like to get done
     * B.4 Things that will be done eventually
     * B.5 Good ideas that are worth implementing
     * B.6 Things that are not likely to get implemented
     * B.7 Notes and Status of various items
          + B.7.1 Affix Compression
          + B.7.2 Extremely Large Dictionaries
          + B.7.3 General region skipping
          + B.7.4 Word skipping by context
          + B.7.5 Tablelise to_soundslike
          + B.7.6 Hidden Markov Model
     _________________________________________________________________
   
                                   B. To Do
                                       
   Words in bold indicate how you should refer to the item when
   discussing it with me or others.
   
                    B.1 Things that will be done real soon
                                       
   These items should get done within a release or two.
   
     * Handle capitulation better when storing replacement pairs.
     * Finish up support for detachable and multiple dictionaries, right
       now aspell will not work right with multiple replacement
       dictionaries.
     * Totally rewrite the aspell international support. See
       http://metalab.unc.edu/kevina/aspell/international/for more
       information.
     * Rework the aspell check function to provide support for using any
       number of filters which will be needed for international support.
     * Namespace clean up. Chose a naming sceme for all of aspell
       functions and classes and stick to it.
     * Write documentation for the library interface.
     * Bring back the C interface library.
       
                        B.2 Things that need to be done
                                       
   Things items will eventually be implemented as I know they are
   important however I am not sure when they will get done.
   
     * Add support for affix compression.
     * Figure out a way for Aspell to work better with extremely large
       dictionaries.
       
                   B.3 Things that I would like to get done
                                       
   These items will eventually be implemented. I hope to have them all
   done before I move aspell to beta testing. They are in the approximate
   order of when they will get done.
   
                    B.4 Things that will be done eventually
                                       
   I plan on doing these things eventually. It is just a matter of
   getting around to it.
   
     * Bring the suggest_ultra method back.
       
                  B.5 Good ideas that are worth implementing
                                       
   These items all sound like good ideas however I am not sure when I
   will get to implementing then if ever. If you are looking for a way to
   contribute picking up on one of these ideas would be a great way to
   start. They are presented in no particular order.
   
     * Come up with a full screen terminal interface similar to Ispell's.
     * Come up with a plug-in for gEdit the gnome text editor.
     * Create a debian package for both the Aspell utility and the
       library.
     * Create a RPM package for both the Aspell utility and the library.
     * Change languages (and thus dictionaries) based on the information
       in the actual document.
     * Come up with an HTML mode for spell checking.
     * Come up with a TEX mode and a nroff mode for spell checking.
     * Come up with a C mode, a C++ mode, a Perl mode, etc for spell
       checking.
     * Come up with a mode that will skip words based on the symbols that
       (almost) always surround the word. (Word skipping by context)
     * Create two server modes for Aspell. One that uses the DICT
       protocol and one that uses ispell -a method of communication of
       some arbitrary port.
     * Improve the C interface and documentation.
     * Come up with a Perl interface.
     * Come up with thread safe personal dictionaries.
     * Tablelise the to_soundslike method.
     * Use the Hidden Markov Model to base the suggestions on not only
       the word itself but on the context around the word.
       
               B.6 Things that are not likely to get implemented
                                       
   Theses ideas are not likely to get implemented any time soon.
   
     * (None Yet)
       
                     B.7 Notes and Status of various items
                                       
B.7.1 Affix Compression

   Due to the current way my spell checker works implementing affix
   compression would be next to impossible. Nevertheless, I do realize
   that for some languages affix compression is very important.
   
   So to solve this dilemma I plan on having two different modes of my
   spell checker: One with affix compression that does not use soundslike
   pairs (much like ispell) and one without affix compression that does
   use soundslike.
   
   I plan to extract the affix manipulation code from Ispell with the
   help of an Ispell author. The tricky part would be getting this to
   getting this all to work properly at tun time bases on the dictionary
   used.
   
B.7.2 Extremely Large Dictionaries

   This problem extends back to the fact of the way words are index is
   Aspell. This problem will get resolved when I implanted the affix
   compression mode as only one index would be used.
   
B.7.3 General region skipping

   I want to implement this give other people an idea of how it should be
   done and because I am really sick of having to spell check through url
   and email address.
   
B.7.4 Word skipping by context

   This was posted on the Aspell mailing list on January 1, 1999:
   
   I had an idea on a great general way to determine if a word should be
   skipped. Determine the words to skip based on the symbols that
   (almost) always surround the word.
   
   For example when asked to check the following C++ code:
   
          cout << "My age is: " << num << endl;
          cout << "Next year I will be " << num + 1 << endl;
          
   cout, num, and endl will all be skipped. "cout" will be skipped
   because it is always preceded by a <<. "num" will be skipped because
   it is always preceded by a <<. And "endl" will be skipped because it
   is always between a << and a ;.
   
   Given the following html code.
   
          <table width=50% cellspacing=0 cellpadding=1>
          <tr><td>One<td>Two<td>Three
          <tr><td>1<td>2<td>3
          </table>
          
          <table cellspacing=0 cellpadding=1>
          </table>
          
   table, width cellspacing, cellpadding, tr, td will all be skipped
   because they are always inclosed in "<>". Now of course table and
   width would be marked as correct anyway however there is no harm in
   skipping them.
   
   So I was wondering if anyone on this list has any experience in
   writing this sort of context recognition code or could give me some
   pointers in the right direction.
   
   This sort of word skipping will be very powerful if done right. I
   imagine that it could replace specific spell checker modes for Tex,
   Nroff, SGML etc because it will automatically be able to figure out
   where it should skip words. It could also probably do a very good job
   on programming languages code.
   
   If you are interested in helping be out with this or just have general
   comments about the idea please let me know.
   
B.7.5 Tablelise to_soundslike

   Geoff Kuenning (one of the Ispell authors) gave me this idea. He
   suggest that I table-drive the conversion instead of hard wiring the
   code to encourage other people to come up with a to_soundslike method
   for their language. ``My experience with ispell leads me to believe
   that if we could give fairly explicit instructions about how to design
   a metaphone algorithm for a language, there are people out there who
   are eager to do the work.''
   
   This sounds like a good idea to and it is something I hope to
   implement eventually as it would surely beat my current crude method
   of just deleting all the values for languages without a customized
   to_soundslike method for the language.
   
B.7.6 Hidden Markov Model

   Knud Haugaard Srensen suggested this one. From his email on the
   Aspell mailing list:
   
     consider this examples.
     
     a fone number. -> a phone number.
     a fone dress. -> a fine dress.
     
     the example illustrates that the right corection might depend on
     the context of the word. So I suggest that you take a look on HMM
     to solve this problem.
     
     This might also provide a good base to include gramma correction in
     aspell.
     
     see this link http://www.cse.ogi.edu/CSLU/HLTsurvey/ch1node7.html
     
   I think it is a great idea. However unfortunately it will probably be
   very complicated to implement. Perhaps in the far future...
     _________________________________________________________________
   
   next up previous contents 
   Next: C. Support for Gcc Up: Aspell .28.3 alpha A Previous: A.
   Changelog   Contents
   
   
    Kevin Atkinson 1999-11-20
